Skip to main content

Object Detection


1. Overview

YOLOv11 is the latest-generation object detection model released by Ultralytics, delivering an exceptional balance between speed and accuracy. When deployed locally on NVIDIA Jetson devices (such as Orin Nano, NX, or AGX), YOLOv11 enables efficient, low-latency AI inference optimized for edge environments.

yolo_overview

This guide covers:

  • Preparing the environment and installing JetPack
  • Running YOLOv11 quickly using Docker
  • Installing YOLOv11 and its dependencies locally
  • Accelerating inference with TensorRT
  • Leveraging DLA acceleration and benchmarking performance

YOLOv11 runs exceptionally well on devices such as the Jetson Orin Nano, offering a powerful solution for real-time edge AI deployment.


2. Environment Preparation

Hardware Support

DeviceSupported JetPack VersionAI Performance
Jetson NanoJetPack 4.6.x472 GFLOPS
Jetson Xavier NXJetPack 5.1.x21 TOPS
Jetson Orin NX 16GBJetPack 6.x100 TOPS
Jetson Orin Nano SuperJetPack 6.x67 TOPS

For optimal results, it is recommended to use JetPack 5.1 or later and enable maximum performance mode:

sudo nvpmodel -m 0
sudo jetson_clocks

3. Quick Start via Docker(Recommended)

The fastest way to get started is by using the prebuilt Docker image provided by Ultralytics. Run the following commands:

sudo docker pull ultralytics/ultralytics:latest-jetson-jetpack6
sudo docker run -it --ipc=host --runtime=nvidia ultralytics/ultralytics:latest-jetson-jetpack6

Includes YOLOv11, PyTorch, Torchvision, TensorRT, and other required dependencies.


4. Local Installation YOLOv11 (Optional)

This option is intended for users who need a customized environment.

Step 1: Set Up the Python Environment

sudo apt update
sudo apt install python3-pip -y
pip install -U pip

Step 2: Install the YOLOv11 Package

pip install ultralytics[export]

Step 3: Install Compatible PyTorch and Torchvision

While ultralytics automatically installs PyTorch and Torchvision, the pip-installed versions are not compatible with ARM64-based Jetson platforms.Instead, you need to manually install a prebuilt PyTorch wheel and build Torchvision from source.

Example: JetPack 6.1 + Python 3.10

pip install https://github.com/ultralytics/assets/releases/download/v0.0.0/torch-2.5.0a0+872d972e41.nv24.08-cp310-cp310-linux_aarch64.whl
pip install https://github.com/ultralytics/assets/releases/download/v0.0.0/torchvision-0.20.0a0+afc54f7-cp310-cp310-linux_aarch64.whl

Install cuSPARSELt to Resolve Torch 2.5.0 Dependency:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/arm64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get -y install libcusparselt0 libcusparselt-dev

Verify PyTorch Version and GPU Availability:

python3 -c "import torch; print(torch.__version__)" # 2.5.0a0+872d972e41.nv24.08
python3 -c "import torch; print(torch.cuda.is_available())" # True

Step 4: Install ONNX Runtime (GPU)

To find all available onnxruntime-gpu packages —organized by JetPack version, Python version, and other compatibility details—in the Jetson Zoo ONNX Runtime Compatibility Matrix. In this example, we will download and install onnxruntime-gpu 1.20.0 with support for Python 3.10.

pip install https://github.com/ultralytics/assets/releases/download/v0.0.0/onnxruntime_gpu-1.20.0-cp310-cp310-linux_aarch64.whl

5. Accelerating YOLOv11 Inference with TensorRT

Ultralytics supports exporting YOLOv11 models to TensorRT engine files (.engine),to significantly improve inference performance.

Python Example

from ultralytics import YOLO

model = YOLO("yolo11n.pt")
model.export(format="engine") # 生成 yolo11n.engine

trt_model = YOLO("yolo11n.engine")
results = trt_model("https://ultralytics.com/images/bus.jpg")

CLI Example

# Export a YOLO11n PyTorch model to TensorRT format
yolo export model=yolo11n.pt format=engine # creates 'yolo11n.engine'
# Run inference with the exported model
yolo predict model=yolo11n.engine source='https://ultralytics.com/images/bus.jpg'

6. Using DLA (Deep Learning Accelerator)

Some Jetson devices feature built-in DLA (Deep Learning Accelerator) cores that enable lower power consumption and improved parallel inference.

Python Example

model.export(format="engine", device="dla:0", half=True)

CLI Example

# Export a YOLO11n PyTorch model to TensorRT format with DLA enabled (only works with FP16 or INT8)
# Once DLA core number is specified at export, it will use the same core at inference
yolo export model=yolo11n.pt format=engine device="dla:0" half=True # dla:0 or dla:1 corresponds to the DLA cores
# Run inference with the exported model on the DLA
yolo predict model=yolo11n.engine source='https://ultralytics.com/images/bus.jpg'

Note: Some model layers may not run entirely on the DLA and will fall back to GPU execution if unsupported.

7、object detection Example

import cv2
import time
from ultralytics import YOLO

# Load the TensorRT engine model (exported from YOLO11)
model = YOLO("yolo11n.engine") # Replace with the path to your .engine model

# Open the USB camera (usually /dev/video0)
cap = cv2.VideoCapture(0)

# Set camera resolution (match model input size for best performance)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

# Initialize FPS calculation variables
fps = 0.0
frame_count = 0
start_time = time.time()
# Check if the camera opened successfully
if not cap.isOpened():
print("❌ Cannot open camera")
exit()
print("📸 Real-time detection started. Press 'q' to quit.")
while True:
# Read a frame from the camera
ret, frame = cap.read()
if not ret:
break
# Start time for inference
t0 = time.time()
# Run inference
results = model(frame)
# Plot the results (draw bounding boxes, labels, etc.)
annotated = results[0].plot()
# Calculate FPS
frame_count += 1
t1 = time.time()
fps = 1. / (t1 - t0)
# Draw FPS on the frame
cv2.putText(annotated, f"FPS: {fps:.2f}", (10, 30),
cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
# Show the annotated frame
cv2.imshow("YOLO11 - TensorRT Real-time Detection", annotated)
# Exit on 'q' key press
if cv2.waitKey(1) & 0xFF == ord("q"):
break
# Release camera and close display window
cap.release()
cv2.destroyAllWindows()

yolo_od

7. Benchmark Performance Comparison

Model FormatOrin Nano(ms)mAP50-95Orin NX(ms)
PyTorch21.30.617619.5
TorchScript13.40.610013.03
TensorRT (FP16)4.910.60964.85
TensorRT (INT8)3.910.31804.37

✅ TensorRT (FP16) achieves the best balance between speed and accuracy. ⚠️ INT8 offers the fastest inference speed but with a significant accuracy drop.


8. Optimization Tips

OptimizationRecommended Command
Power Modesudo nvpmodel -m 0
CPU/GPU Frequencysudo jetson_clocks
System Monitoringsudo pip install jetson-statsjtop
Memory ManagementUse swap efficiently, clean cache when needed

9. Troubleshooting

IssuesSolution
Cannot import PyTorch after installationMake sure you're using the .whl package specifically built for Jetson
TensorRT inference is slower than expectedCheck if jetson_clocks is enabled and use FP16 mode
Failed to pull Docker imageEnsure Docker is properly installed and run with --runtime=nvidia
No module named 'tensorrt' in virtual envCopy the package from host to venv: cp -r /usr/lib/python3.10/dist-packages/tensorrt your_venv/lib/python3.10/site-packages/

References